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ABSTRACT 


A  concept  for  an  apparatus  which  visually  displays  and  responds 
to  the  first  and  second  formant  of  vowel  sounds  is  developed.   The 
machine  is  intended  for  use  by  deaf  and  speech  handicapped  children 
in  learning  to  produce  voiced  sounds.   System  design  and  principles 
applied  to  realize  a  physical  prototype  of  this  concept  are  presented, 
The  complete  electronic  and  mechanical  design  plus  fabrication  of  the 
automatic  electronic  speech  training  responder  is  described  in 
detail.   Schematic  diagrams  of  all  electronic  circuitry  employed 
and  photographs  of  the  prototype  equipment  are  included.   The 
apparatus  is  on  loan  to  the  Monterey  Institute  for  Speech  and 
Hearing,  Monterey,  California,  for  clinical  testing  and  evaluation. 
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1.  INTRODUCTION 

Man  is  born  with  the  natural  instinct  and  physical  capacity 
to  eat  and  breathe,  but  he  must  learn  how  to  speak.  This  learning 
process  depends  on  good  hearing  ability  during  the  formative  years. 
A  dhild,  during  initial  attempts  to  speak,  constantly  monitors  his 
utterances  with  his  ears.  These  sensors  provide  the  necessary 
information  to  the  brain  to  modify  the  vocal  tract  modulators  and 
articulators  with  respect  to  the  points  of  articulation  until  the 
desired  sound,  phoneme  or  word  is  correctly  produced.  If  this 
feedback  loop  (voice  output-ear  sensor-brain  input)  is  defective 
or  nonexistent  in  a  human,  it  is  necessary  that  another  physical 
sensor  must  be  used  as  an  alternate  feedback  path  to  monitor 
generated  speech  sounds  on  a  real  time  basis  if  intelligent  and 
comprehensible  communication  is  to  be  achieved.  Many  devices  have 
been  devised  and  constructed  which  transform  speech  sounds  into 
a  visual  display  or  a  tactile  signal. 

This  thesis  is  directed  toward  the  attempt  to  process 
specific  speech  sounds  and  to  display  or  provide  a  positive  response 
when  the  desired  sound  has  been  correctly  produced.  In  addition, 
the  machine  must  be  simple  in  final  output  so  that  it  can  be  easily 
used  and  interpreted  by  children. 

Computer  sciences  have  stimulated  research  into  speech  recog- 
nition and  synthesization.  Unfortunately,  this  type  of  engineering 
technology  is  too  costly  and  sophisticated  at  the  present  time  for 
application  to  elementary  speech  training  problems.  Rather  specific 
guide  lines  on  needs  of  training  devices  for  children  were 
developed  by  Dr.  Burl  Gray  of  the  Monterey  Institute  for  Speech 
and  Hearing;  these  are: 
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1.   A  definite  need  exists  for  simple,  inexpensive  devices 
which  will  assist  or  supplement  the  speech  therapist's 
work  with  deaf  children.   These  devices  would  permit  the 
instructor  to  teach  more  students  simultaneously  or  the 
devices  could  perform  elementary  tasks  of  providing 
various  mechanical  responses  to  repetitious  articulation 
drills  without  the  constant  attention  or  intervention  of  the 
speech  therapist. 
2:   The  information  display  or  mechanical  response  of  such  an 
apparatus  must  be  in  a  form  which  is  easily  communicable 
to  and  understood  by  the  child.   Careful  attention  must 
be  given  to  the  human-machine  interface  problem  to  insure 
good  results  with  a  given  age  group  and  mental  attitude. 
3.   The  apparatus  must  present  the  visual  or  mechanical 

response  while  the  child  is  speaking  (i.e.  real  time). 
Using  these  criteria,  an  attempt  has  been  made  to  design  and 
construct  an  apparatus  which  will  respond  only  to  a  defined  pro- 
nounciation  of  the  basic  American  vowel  sounds.   The  vowel  sounds 
were  selected  for  machine  recognition  because  they  require  the 
minimum  amount  of  audio  spectral  information  to  be  uniquely 
identified.   However,  the  approach  to  this  vowel  processing 
technique  is  sufficiently  general,   It  may  have  possible  extensions 
to  process  other  sounds. 

Figure  1  is  a  graphic  representation  of  a  generalized  man- 
machine  speech  feedback  system. 
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2.  THE  VOWEL  SOUND 

A  human  can  produce  a  multitude  of  speech  sounds  by  controlling 
his  articulators  (the  tongue  and  lower  lip),  the  points  of  articula- 
tion (the  upper  lip,  alveolar  ridge,  the  hard  palate,  the  soft 
palate  or  velum  and  lower  teeth),  and  the  excitation  of  his  vocal 
bands.  The  vocal  bands,  if  tensed  and  therefore  vibrating,  modulate 
the  air  stream  exhaled  from  the  lungs  to  establish  a  category  of 
sounds  which  are  classed  as  voiced  sounds.  All  vowels  are  voiced 
sounds!  which  are  excluded  from  entry  into  the  nasal  cavity  by  a 
raised  velum  and  therefore  eminate  solely  from  the  oral  cavity. 

It  will  become  apparent  that  the  vowels  are  constrained  to  a 
small  category  of  speech  sounds  by  definition  of  the  manner  in 
which  they  are  articulated.  In  fact,  the  basic  American  vowels 
consist  of  10  phonemes.  Table  1  lists  the  individual  sounds  with 
their  phonetic  notation  and  representative  words.  QlO, 30, 33 J 

Since  this  thesis  is  devoted  to  application  of  electronic 
techniques  to  speech  processing,  it  is  natural  to  begin  with  a 
machine  which  will  react  to  the  most  fundamental  sounds  which 
require  the  minimal  spectral  information  to  be  recognized  or 
identified.  The  vowel  can  be  specified  by  a  minimum  of  two  spectral 
parameters  in  most  sound  situations.  Joint  discussions  with  Dr. 
Gray  and  Dr.  Ewing  resulted  in  establishing  a  mutually  acceptable 
concept  of  an  electronic  vowel  teaching  machine.  This  local  merger 
of  ideas  from  two  disciplines  proves  once  again  that  scientific 
boundaries  can  greatly  overlap  and  the  systems  engineering  approach 
to  problems  may  be  of  great  benefit  to  all  concerned. 

The  theory  of  vowel  production  can  be  described  in  terms  of 
steady  state  (or  harmonic)  conditions  with  application  of 
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TABLE  1 
VOWEL  PHONETIC  SYMBOLS  AND  REPRESENTATIVE  WORDS 


Typewritten 
Symbo]  for 

Vowel 

IPA 
Symbol 

Representative  Words 

IY 

• 

t 

heed        beat 

eat 

I 

I 

hid        bit 

it 

E 

e 

head        bet 

let 

au; 

2 

had         bat 

hat 

A 

a 

hod         calm 

father 

cw 

0 

hawed       fall 

lost 

u 

XX 

hood        full 

foot 

00 

XI 

who'd       fool 

pool 

UH 

A 

hud        above 

tub 

ER 

3^ 

heard       word 

hurt 

JO  (XL 


•a  2 


o 

c   1 

cr 

0) 

£  o 


I   e   a?  a    o   v  m. 


Figure  2.   Typioal  Spectrograms  of  the  vowels  by  a 

male  voice. 
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cord -tone-resonance  effects  c  [21]  Modern  analytical  representation 
of  the  same  effect  can  be  stated  in  terms  of  excitation  functions 
and  convolution  techniques.  \j6]   The  former  description  states  in 
effect  th^t  the  vocal  bands  (  a  modern  synonym  for  cords  [3]  ), 
during  phonatioir,  set  up  in  the  air  Immediately  adjacent  to  them 
a  complex  motion  which  consists  of  a  fundamental  component,  known 
as  pitch,  and  a  large  number  of  its  overtones  or  harmonics.  This 
complex  air  motion  constitutes  the  so-called  band -tone.  The 
theory  further  states  that  the  vocal  cavities,  on  which  the  band- 
tone  acts  as  a  force,  have  the  properties  of  simple  resonators  and 
thus  serve  to  modify  the  spectrum  of  energy  flowing  from  the  bands. 
In  terms  of  this  theory,  a  vowel  sound,  as  emitted  from  the  mouth, 
is  due  to  both  selective  generation  and  selective  transmission  plus 
radiation.  This  sound  is  composed  mainly  of  harmonic  components 
of  the  fundamental  each  of  which  has  a  determinable  magnitude. 
For  example,  the  greatest  magnitudes  of  the  harmonic  components 
usually  are  found  to  exist  for  the  6th  through  9th  component  and 
13th  through  16th  component  for  the  particular  vowel  sound  /a/ .  [2lJ 
Naturally,  for  other  vowels,  the  oral  cavities  change  in  physical 
dimensions  thus  affecting  the  resonant  properties  of  these  chambers 
and  hence  causing  other  harmonic  components  or  partials  of  the 
fundamental  vibration  of  the  vocal  bands  to  be  amplified  or  atten- 
uated . 

The  spectograph  has  greatly  enhanced  the  study  of  speech 
sounds  and  in  particular  vividly  identifies  the  amplified  partial 
tones  or  resonant  frequencies  uniquely  identifiable  with  each  vowel 
sound.  [32,33J  Figure  2  provides  a  sketch  representing  the 
spectrographic  tracings  due  to  each  vowel  sound.  The  dark  areas 
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represent  the  amplified  harmonics  of  the  fundamental  pitch  of  the 
voice.  Note  that  these  locations  are  unique  for  each  vowel, 
especially  for  the  first  and  second  resonant  frequencies.  In  the 
terminology  of  visible  speech  the  dark  bands  are  called  "formant" 
regions  or  "bars"  and  for  reference  purposes  are  designated  by 
number,  the  lowest  on  the  frequency  scale  being  bar  1  or  first 
formant  Fl,  the  next  bar  2  or  second  formant  F2,  etc.  In  this 
thesis,  the  notation  Fl  and  F2  shall  be  used  to  designate  the 
first  and  second  resonant  frequencies  of  vowel  sounds  respectively. 

The  first  and  second  formants  are  the  only  two  pieces  of 
spectral  information  required,  in  most  cases,  to  identify  a  particular 
vowel.  The  third  formant  (F3)  is  helpful  in  distinguishing  between 
overlapping  first  and  second  formant  frequencies.  Potter  and 
Peterson  have  suggested  that  the  human  ear  recognizes  vowel  sounds, 
not  by  the  spectral  location  of  Fl  and  F2,  but  rather  by  the  relative 
frequency  separation  or  difference  between  Fl  and  F2.  [33j  Table  2 
lists  the  Fl,  F2  and  F3  frequencies  for  the  vowels  of  Table  1  while 
Table  3  lists  the  relative  formant  amplitudes.  [30J  Figure  3  shows 
a  two  dimensional  plot  of  Fl  vs  F2.  [9]  This  figure  is  the  crux 
of  the  apparatus  designed  to  recognize  vowel  sounds.  Note  that 
in  the  F1-F2  plane  each  vowel  has  a  specific  location;  also  it 
is  interesting  to  note  that  the  locations  of  these  sounds  corresponds 
roughly  to  the  position  of  the  tongue  in  the  oral  cavity  if  you 
imagine  looking  at  a  side  view  of  the  head. 

The  vowel  training  device  does  not  work  on  the  relative  location 

of  Fl  to  F2  but  rather  utilizes  an  electronic  spectral  window  in  the 

F1-F2  plane  to  target  a  particular  vowel  sound  or  for  that  matter, 

any  voiced  combination  of  two  oral  resonances  in  this  dual  formant 

plane. 
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TABLE  2 
AVERAGES  OF  FUNDAMENTAL  AND  FORMANT  FREQUENCIES 


TPA 

Fundamental 

First 

Second 

Third 

Symbol 

Frequency  (Hz) 

Formant(Hz) 

Formant(Hz) 

Formant(  Hz ) 

• 

M 

136 

270 

2290 

3010 

L 

W 

235 

310 

2770 

3310 

Ch 

272 

370 

3200 

3730 

I 

M 

135 

390 

1990 

2550 

W 

232 

430 

2480 

3070 

Ch 

26Q 

530 

2730 

3600 

£ 

M 

130 

530 

1840 

2480 

W 

223 

610 

2330 

2990 

Ch 

26o 

690 

2610 

3570 

<£ 

M 

127 

660 

1720 

2410 

W 

210 

360 

2050 

2850 

Ch 

251 

1010 

2320 

3320 

M 

124 

730 

1090 

2440 

a 

W 

212 

850 

1220 

2810 

Ch 

256 

1030 

1370 

3170 

o 

M 

129 

570 

340 

2410 

W 

216 

590 

920 

2710 

Ch 

263 

680 

1060 

3180 

V 

M 

137 

440 

1020 

2240 

W 

232 

470 

1160 

2680 

Ch 

276 

560 

1410 

3310 

M 

1*1 

300 

870 

2240 

u 

W 

231 

370 

950 

2670 

Ch 

274 

430 

1170 

3260 

M 

130 

640 

1190 

2390 

A 

W 

221 

760 

1400 

2780 

Ch 

261 

850 

1590 

3360 

^A 

M 

133 

490 

1350 

1690 

A 

W 

218 

500 

1640 

i960 

0 

Ch 

261 

560 

1820 

2160 
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TABLE  3 
FORMANT  AMPLITUDES  MEASURED  RELATIVE  TO  /O/ 


IPA 
Symbol 

First 
Formjmt  (db) 

Second 
Formant  (db) 

Third 
Formant  (db) 

• 

L 

-4 

-24 

-28 

I 

-3 

-23 

-27 

£ 

-2 

-17 

-24 

cS 

-1 

-12 

-22 

a 

-1 

-5 

-28 

0 

0 

-7 

-34 

u 

-1 

-12 

-34 

M. 

-3 

-19 

-43 

A 

-l 

-10 

-27 

T 

-5 

-15 

-20 

21 
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2K 


IK 


500 


400 


t 

» 

x 

X 

^™      J* 

x 

0  ('3 

J 
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x^       * 

200    250 


500 
Fl  (Hz) 


IK 


Figure  3.  Central  Regions  of  First  and  Second  Forraant 
Frequencies  of  the  Common  American  Vowels. 
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3.  AESTR  AS  A  TEACHING  AID 

During  the  process  of  physically  realizing  a  prototype  of  the 
vowel  training  machine,  liberty  was  taken  by  the  author  and  his 
associates  in  the  electronics  laboratory  to  coin  a  name  for  this 
device.  The  result  is  Automatic  Electronic  Speech  Teaching 
Respond er.  The  title  should  convey  the  notion  that  this  machine 
is  not  intended  bo  replace  a  speech  therapist  but  rather  assist 
him  in  his  work.  AESTR  will  be  initially  preset  through  the 
oscillator  frequency  dials  and  the  control  knobs  located  on  the 
front  panel  of  the  device.  Now  the  child  is  placed  in  a  room  with 
a  candy  dispenser  or  some  other  motivational  respond er  and  a 
microphone.  He  is  asked  to  make  any  sound  he  cares  to.  As  the 
child  produces  various  sounds  he  should  produce  the  desired  sound 
in  due  time.  The  machine  will  only  activate  the  candy  dispenser  when 
the  child  has  produced  the  targeted  voiced  sound  and  the  child 
will  keep  trying  to  repeat  the  sound  in  order  to  maximize  his 
reward.  As  the  rewards  are  given  more  frequently,  the  teacher  is 
able  to  adjust  the  filter  bandwidths  on  AESTR  and  narrow  the 
spectral  window  of  the  desired  sound,  hence  increasing  the 
articulation  accuracy  required  of  the  child  if  he  is  to  obtain 
his  reward. 

The  child  learns  to  speak  desired  sounds  by  communicating 
directly  with  AESTR.  However,  positive  control  of  the  speech 
training  process  is  available  to  the  teacher  by  his  ability 
to  vary  six  parameters  from  the  front  panel  of  AESTR.  (Fl 
bandwidth  and  sensitivity,  F2  bandwidth  and  sensitivity,  pitch 
filter,  microphone  gain). 
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**.  SYSTEM  CRITERIA  AND  DESIGN 

The  system  incorporates  two  basic  electronic  functions  to 
locate  and  measure,  in  real  time6  the  first  and  second  formants  of 
a  voiced  sound.  The  speech  sound  is  first  mixed  with  two  local 
oscillators  by  means  of  non-linear  devices c  One  of  the  components 
obtained  from  this  process,  the  difference  frequencies  between 
the  fomants  and  oscillators,  is  isolated  by  active  low  pass 
filter  circuits 8  The  oscillators  and  low  pass  filters  are 
variable  and  can  be  set  for  a  particular  sound  or  spectral 
window,,  By   setting  the  two  local  oscillators  to  the  known  fre- 
quencies for  Fl  and  F2  of  a  particular  vowel  and  the  low  pass 
cutoff  frequency  for  the  desired  degree  of  accuracy  of  response, 
the  machine  is  able  to  process  the  speech  sound  and  provide  a 
binary  decision  response* 
The  responses  are: 

It     A  positive  response  which  is  movement  of  two  voltmeter 
indicators  and  a  light  being  activated  if  both  meters 
are  at  maximum  value  simultaneously.  This  condition  occurs 
when  the  resonant  frequencies  of  the  voice  correlate 
with  the  preset  local  oscillator  frequencies  simul- 
taneously *  The  correct  voiced  sound  is  being  produced 
by  the  student*  The  apparatus  also  has  an  external 
motivation  output  jack  which  can  operate  other  reward 
machines  when  the  targeted  sound  is  produced  by  the 
student. 
2.  No  response.  One  or  both  forraant  frequencies  are  not 
present  or  they  do  not  correlate  within  limits  set  by 
the  filter  pass  band. 


Many  methods  ware  considered  for  realization  of  this  device  in 
terms  of  simplicity,  cost  and  expediency*  Primary  concern  was  to  pro- 
duce some  type  of  primitive  machine  which  would  do  the  basic  tasks  re- 
quired by  this  particular  vowel  teaching  aid.  The  approach  finally 
selected  for  the  first  attempt  is  to  process  the  complex  speech  wave- 
form in  analog  form  in  the  audio  specturm.   Advantage  was  taken  of 
the  Field-Effect  Transistor(FET)  which  has  an  almost  perfect  square 
law  response  and  which  is  ideally  suited  for  optimum  mixing  of  oscil- 
lator and  voice  frequencies.  The  filtering  is  accomplished  by  means 
of  active  low°pass  filters  using  the  readily  available  integrated  cir- 
cuit operational  amplifiers « 

An  additional  factor  must  be  considered  in  AESTRfs  system  design. 
The  pitch  of  a  human  voice  can  range  from  approximately  75   to  500  Hz. 
I  32 !  The  f ormant  frequencies  range  from  approximately  250  to  3000  Hz . 
It  is  necessary  to  eliminate  the  pitch  frequency  from  the  audio  speech 
prior  to  the  mixing  operation,  otherwise  it  is  possible  for  the  pitch 
or  fundamental  frequency  to  pass  directly  through  the  mixers  and 
filters  thus  producing  a  positive  machine  response  regardless  of  the 
formant  and  oscillator  frequencies  present.  Figure  k   represents  the 
basic  system  approach  for  realization  of  this  apparatus. 


25 


Miorophone 


I 


Preamplifier 


Pitch  Attenuator 
(variable  high  pass  filter) 


F1RJT  FORMANT 


JECOK'')  FORMANT 


Local  Oscillator 


Mixer 


I 


L-ocal  Oscillator 


Mixer 


Low  pass  fj iter 
(variable  10,  15,  30  and  6C  Hz) 


Visual  Indicator 
(0-10  V.  Voltmeter) 


I 


Low  pass  filter 
(variable  10,  15,  30  and  60  H/J 


Visual  Indicator 
(0-10  V.  Voltmeter) 


Binary  Decision 
(are  both  outputs  at  maximum?) 
YES 


"Correct" 

(green  light) 

activated 


nixternal  Motivation 
activated 


Figure  4.     AESTR  FLLectronic  Transducer,   Detector,   and  Display  System 
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i.   CONTROL  rANWL  DESIGN  AND  OPERATION 


ASSTR  is  to  be  operated  by  individuals  who  do  not  possess 
an  engineering  background „  Therefore  the  panel  is  designed 
to  be  self  explanitory  and  requires  minimum  instruction  for 
operation.  The  controls  are  fairly  large  to  permit  positive 
grasp  by  the  operator 0  Also  functional  location  of  the  knobs 
and  visual  indicators  is  evident  by  the  partition  lines.  The 
objective  is  to  have  the  panel  functions  reflect  the  needs  of 
the  operator  rather  than  the  requirements  of  the  internal  cir- 
cuitry. AESTR's  control  panel  is  shown  in  Figure  5» 

The  "volume"  control  is  self  explanitory  and  permits  the 
operator  to  vary  the  gain  of  the  preamplifier  circuit. 

The  "pitch"  control  permits  selection  of  four  cutoff  fre- 
quencies of  the  high-pass  filter  circuit  in  order  to  suppress 
the  fundamental  frequency  of  a  voice  while  passing  all  the 
formant  frequencies .  In  Table  4-  below,  the  letter  positions 
are  identified  with  the  3  db  cutoff  frequencies  of  the  high- 
pass  filter. 

TABLE  k 
HIGH-PASS  FILTER  CUTOFF  FREQUENCIES 


Pitch  control  position 

Frequency  (Hz) 

A  (male  voice) 

75 

B  (female  voice)  \ 

190 

C  (child's  voice) 

450 

D  (special  use) 

1050 

The  pitch  control  setting  is  not  critical  for  the  back  vowel 
sounds  such  as  0W  in  the  word  "father"  and  can  remain  in  the 
"A"  or  "B"  setting  for  all  speakers  regardless  of  sex  or  age. 
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Position  "D"  is  used  when  working  with  the  central  vowel  such 
as  SR  in  the  word  "bird".  It  is  necessary  to  suppress  the  fre- 
quencies below  1  KHz  and  operate  with  the  second  and/or  third 
formants  in  order  to  have  the  machine  only  respond  to  this 
particular  vowel.  This  technique  was  developed  during  the 
testing  of  AESTR  and  is  discussed  further  in  section  13. 

The  "Fl  LPF  cutoff"  and  "F2  LPF  cutoff"  are  variable  cutoff 
low-pass  filters.  The  controls  are  located  above  the  first 
and  second  formant  voltmeter  indicators  respectively.  Cutoff 
frequencies  of  10,  15,  30  and  60  Hz  are  printed  around  the 
periphery  of  the  control  knobs.  Normally  the  controls  are 
initially  set  in  the  60  Hz  position  when  searching  for  voice 
formant.  This  setting  provides  the  widest  possible  filter 
pass  band,  such  that  the  voltmeter  needle  will  begin  to  deflect 
up  scale  whenever  the  oscillator  and  voice  formant  are  within 
+  60  Hz  of  each  other.  As  the  two  frequencies  become  more  near- 
ly coincident,  the  voltmeter  needle  will  show  a  maximum  scale 
deflection.  When  the  operator  has  the  oscillator  set  at  a 
frequency  which  gives  the  maximum  needle  deflection,  he  may 
elect  to  switch  the  "LPF  cutoff"  control  to  30  Hz  in  order  to 
narrow  the  filter  response  pass  band.  It  may  be  necessary 
to  readjust  the  local  oscillator  slightly  for  maximum  scale 
deflection.  This  procedure  can  be  continued  for  the  15  and 
10  Hz  cutoff  frequencies  respectively. 

The  "Fl  Filter  Sensitivity"  and  "F2  Filter  Sensitivity" 
controls  vary  the  gain  of  the  filters ,  The  word  "sensitivity" 
is  chosen  for  contrast  against  the  "volume"  control  nomenclature 
and  is  intended  to  prevent  any  misunderstanding  between  the  two 
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types  of  controls *     The  "filter    s     by"  controls  are  ad- 
justed to  make  the         needle  deflect  to  full  so&Ie  when 
the  local  oscx.-in.tcr  and  foment  frequencies  most  nearly  co- 
incide. K&o-h  vowel  soum      have  its  unique  "filter  sensi- 
tivity" setting  c;;:..     e  v»   yin      isity  Levels  of  the 
fomarits  of  the     ■  .  ■  al  phonemes.  The  operator  must  determine 
these  settin     ....  _  .  Lly  since  the  sensitivity  is  also  a 
function  of  the  intensity  of  the  speaker     Lee.  It  is  ad- 
visable to  keep  the  "volume"  control  kn<  b  at  a  minimum  setting 
and  the  "sensitivity"  control  knobs  at  *  dgh  setting  to  reduce 
the  effects  of  acoustic  and  electrical  miieo 

The  ''correct"  green  light  illuminates  when  both  formant 
indicators  read  an  up  scale  deflection  of  ":  volts*  Light 
activation  is  dela;  eeonds  an   ne«  lighted,  stays 

on  for  a  period  of  Z   seconds.  The  delay  prevents  the  light 
from  being  activated  by  transient  i  ill  scale  deflections  which 
occur  from  plosive  type  consonant  sounds  preceding  a  vowel 
in  such  a  word  ss  "bar".  The  light  hold  time  of  2  seconds 
prevents  the  light  from  flickering  if  the  voice  begins  to 
quiver  during  articulation  of  a  phoneme c 

In  the  rear  of  the  AKSTR  cabinet  is  located  an  ordinary 
female  115  volt  r    sole.  Any  external  motivational  device, 
such  as  an  M&M  candy  dispenser,  can  be  attached  to  this  ter- 
minal and  -will  be  operated  automatically  since  the  terminal 
provides  lis  volts  only  during  the  interval  when  the  "correct" 
light  is  illuminated c 

Al'STft  also  has  the  capability  of  measuring  the  pitch  of 
a  person's  voice   Turn  the  "pitch"  control  clockwise  beyond 
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Lt  stops.  Set  the  "Fl  —V  Fi    b<  .•  ;  control 
to  10  Hs  and  the  "Fl  Sensitivity"  control  to  a  maximum  value  of 
10.   Turn  the  "F2  Sensitivity"  full  counterclockwise  to  a  value 
of  0e  Sweep  the  Fl  local  oscillator  through  a  range  of  60  to 
500  Hze  The  speaker's  pitch  will  be  read  on  the  Fl  oscillator 
frequency  setting  when  the  first  formant  visual  indicator  has 
a  maximum  up-scale  deflection ■ 
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6.  PREAMPLIFIER  DESIGN 

The  mixer  circuit  is  able  to  accept  a  maximum  input  signal 
of  0.8  volts  peak  to  peak,  A  preamplifier  is  necessary, . es- 
pecially if  a  dynamic  microphone  is  being  used,  to  amplify  the 
voice  sound  for  maximum  mixer  output.  The  Fairchild  uA709 
operational  amplifier  was  selected  to  perform  this  function 
untilizing  the  standard  feedback  configuration  and  necessary 
frquency  compensation.  It  is  shown  schematically  in  Figure  6. 

The  uA709  comes  in  an  epoxy  T0=>5  configuration  »  The«;de- 
tailed  circuitry  employed  in  this  integrated  circuit  and  per- 
formance data  are  readily  available  from  the  manufacturer.  Ill] 
The  price  of  this  device  i&  not  considered  to  be  excessive 
at  the  present  time. 
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7.  MIXER  CIRCUIT 

The  transfer  characteristic  of  field -effect  transistor 
(FIST),  made  by  the  diffusion  process,  has  a  square-law  rela- 
tionship between  the  drain  to  source  current,  I^s  and  the  gate 
to  source  voltage,  V^g,  It  is  expressed  as  125.1 

*ds  =  IDSS  (  1  -  %s/  Vj  )  (1; 

where  Ipss  is  the  saturation  drain  current  when  trie  gate  is  shorted 
to  the  source  (  Vq^  =  0  )  and  Vp  is  the  pinch  off  voltage,  /or 
mixer  operation  let  VgS  be  represented  as  the  sum  of  two  sinu- 
soidal voltages  both  of  which  can  be  simultaneous! 
on  the  gate  of  an  FiS'f  or  one  impressed  on  the  gate  and  the 
other  on  the  source  of  the  PET.  wither  process  will  cause 
mixing  operation  and  VgS  as  defined  below  holes  true  for  both 
cases 

%s  =  V(jjs  +  Vs  cos  wst  +  V0  cos  v0t  (2) 

where  Vqs  is  the  bias  gate  to  source  voltage.  Vs  cos  Ws  - 
represents  the  source  or  voice  sound  while  V0  cos  v0t  i.c:  the 
sinusoid  generated  by  the  local  oscillator  which  is  applied 
to  the  gate  or  source  of  the  ?££.     Substituting  (2)  Into  (1) 
and  expanding,  we  obtain 

Ids  s  IDSS  (  Vp2  +  VGS2  +  ft*   +  H>2  (3) 

V     L      , 

-2(V*  -  Vqs)  (Vscos  wst  +  Vo«os  w0t; 
+  i  Vs2  cos  2wst  +  |  VQ2  cos  2w0t 
+  VsVo[eos(ws  +  w0)t  -  cos(ws  -  woHjr 
fhe  drain  current  has  DC  components  plus  six  individual  fre- 
quencies as  a  result  of  the  square  la    ±ng  of  an  FET,  This 
■■->nse  shows  that  only  fr«     as  of  the  form  ws,  w0,  ?ws, 
,♦  ws  +  wQ,  and  \      -    rQ   are  obtained  while  other  frequencies 
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of  the  form  mws  +  nwQ,  which  must  be  suppressed  in  conventional 
mixers,  are  greatly  reduced  with  an  FET  mixer  circuit.  [2 

The  frequency  component  of  the  drain  current  which  is  of 
interest  is  VsV0cos(ws  -  w0)t.  It  is  separated  from  the  other 
components  by  coupling  to  the  mixer  output  a  DC  blocking  capa- 
citor followed  by  a  low-pass  filter  which  has  a  cut  off  fre- 
quency w-f ^2.  such  that  ws  -  w0  <  Wf±i  <  both  ws  and  w0 . 

Initially,  a  dual  gate  metal  oxide  semiconductor  (MOS) 
FET  was  selected  as  being  particularly  well  suited  for  use  in 
mixing  two  audio  frequencies.  Eight  3N1^1  MOS-FET's  were  ordered 
but  due  to  excessive  delay  in  receipt  of  these  devices,  it  was 
necessary  to  design  and  build  a  mixer  using  a  single  gate 
FET  already  in  stock  in  the  school  electronics  issue  room. 
This  device  requires  that  the  voice  signal  be  impressed  on  the 
gate  while  the  local  oscillator  signal  is  applied  to  the  source 
terminal.  Several  types  of  FET's  available  from  the  issue  room 
were  tested  for  mixing  action  in  the  circuit  shown  in  figure  7a. 
The  2N3819  proved  to  be  the  most  satisfactory  device.  Its 
transconductance  as  a  function  of  gate  to  source  voltage  is 
quite  linear  over  the  range  from  zero  Vq5  to  pinch  off  voltage 
Vp.  This  characteristic  enhances  the  mixing  action  of  an  FET.[20J 

The  local  oscillator  used  in  AESTR  is  a  URM-127  signal 
generator.  It  has  an  output  impedance  of  approximately  100 
ohms  and  can  deliver  a  signal  ranging  from  the  microvolt  range 
to  a  maximum  of  10  volts. 

In  designing  the  mixer  circuit  the  author  relied  on  the 
manufacturer's  data  sheet  for  the  2N3819  FET.  It  is  an  N- 
channel  device  with  Vp  =  -8  volts,  Ipss  =  10  ma  anc*  an  average 
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transconductance  of  4000  micromhos  for  zero  gate  bias.  The  DC 
drain  current  Ip  was  selected  to  be  1  ma.  and  the  mixer  circuit 
was  designed  to  give  a  voltage  gain  of  10.  These  conditions 
were  incorporated  into  the  circuit  design  [25]  and  values  were 
obtained  such  that  Rg  =  5*5   kohms  and  Rp  =  8.1  kohms.  The 
network  in  Figure  7a  was  constructed  and  the  components  sub- 
sequently modified  to  the  circuit  of  Figure  7b  to  obtain 
optimum  mixing. 

Successful  mixing  of  any  two  audio  frequencies  is  accom- 
plished by  means  of  this  circuit  with  no  lower  limit  on  the 
input  and  local  oscillator  voltages.  An  upper  limit  of  1.5 
volts  peak  to  peak  for  the  signal  and  local  oscillator  voltages 
cannot  be  exceeded;  otherwise  the  output  is  clipped.  Optimum 
operation  of  this  mixer  circuit  is  set  for  an  input  of 
approximately  0.8  volts  peak  to  peak.  Above  this  voltage, 
the  follow-on  filter  circuits  begin  to  give  spurious  outputs 
due  to  sweeping  of  either  the  voice  oscillator  or  local  os- 
cillator across  the  frequency  spectrum.  This  effect  is  notice- 
able on  the  Fl  and  F2  voltmeter  indicators  and  masks  the  fre- 
quency response  of  the  filters. 

The  2N3819  FET's  have  consistently  performed  the  mixing 
operation  on  a  daily  basis  during  the  entire  period  covering 
the  design  and  testing  of  the  formant  indicators.  These  par- 
ticular FET's  are  highly  recommended  both  for  their  reliability 
and  usefulness  in  audio  mixing  circuits. 

As  an  epilog  to  the  mixer  design  realization,  the  3N141 
MOS-FET's  did  arrive  finally.  Other  students  have  had  limited 
success  in  using  these  devices  for  mixing.  Special  care  must 
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be  exercised  in  using  then,  especially  with  regard  to  preventing 
any  external  high  voltages  (static  charges,  etc.)  from  acci- 
dentally damaging  the  devices. 
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8.  FILTER  CONSIDERATIONS 

Many  types  of  network  designs  will  yield  either  low-pass, 
high-pass  or  band-pass  frequency  filters.  The  networks  may  be 
synthesized  using  only  passive  elements  [_I3,  19j  or  in  addition 
to  resistive  or  capacitive  components,  incorporate  a  radio  tube, 
[35]  transistor,  [l5J  or  an  integrated  circuit  operational  ampli- 
fier. [^,5.1^]  When  a  filter  design  calls  for  cutoff  frequencies 
below  100  Hz,  several  considerations  tend  to  indicate  that  an 
active  RC  filter  circuit  is  the  most  desirable  type.  Table  5 
lists  the  relative  characteristics  of  passive  and  active  filters 
with  cutoff  frequencies  below  100  Hz. 

Active  network  synthesis  can  be  classified  in  a  number  of 
ways,  depending  on  the  purpose  of  active  elements  and  the  network 
configuration.  The  three  main  types  of  active  synthesis  consist 
of  a.  Classicg j.  Amplifier  jtesign  where  the  active  element  is 
part  of  the  parameters  of  the  network,  b.  Feedback  Systems  where 
feedback  theories  are  used  to  synthesize  poles  and  zeros  of  a 
network  function.  In  this  case  active  elements  are  used  as 
isolation  or  amplification  devices,  or  as  functions  of  oper- 
ational amplifiers,  c.  Modification  of  Passive  Synthesis 
where  techniques  of  passive  synthesis  are  used  to  realize 
portions  of  a  network  that  are  connected  together  by  active 
elements.  In  all  three  categories  listed,  the  active  elements 
are  used  mainly  as  controlled-source  devices  which  perform 
functions  of  subtraction  ,  negative-constant  multiplier  or 
inversion.  They  can  be  treated  as  black  boxes  performing  their 
prescribed  mathematical  functions.  \_3Qj 

The  ideal  low-pass  filter  with  unity  transmission  below 
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TABLE  5 

COMPARISON  OF  PASSIVE  AND  ACTIVE  FILTER  CHARACTERISTICS 
FOR  LOW  FREQUENCY  APPLICATIONS 


Passive  RLC  or  RC  Filters 

Active  RC  Filters 

1.  Inductors  tend  to  be  expen- 
sive, harge,  heavy  and  suscep- 
tible to  hum  (A-C  line  frequency) 

pickup. 

1,  An  inductor  can  be  replaced 
by  an  active  circuit  which  has 
an  appropriate  input  impedance. 

2.  It  is  possible  to  use  an 
active  element  as  a  cathode  or 
emitter  follower  or  an  opera- 
tional amplifier  in  synthesis  of 
the  filter. 

2.  For  filters  consisting  of 
only  resistors  and  capacitors, 
the  poles  and  zeros  of  the 
driving-point  immittance  func- 
tions of  RC  filters  are  restrict- 
ed to  the  negative,  real  axis  of 
the  s  plane-,  and  the  same  is  true 
for  the  poles  of  the  transfer 
functions. 

3.  It  is  possible  to  realize 
driving-point  functions  and 
transfer  functions  with  no 
restriction  on  the  poles  and 
zeros. 

3.   A  maximum  attenuation  of  6 
db  per  octave  can  be  obtained 
with  ench  individual  RC  filter 
section. 

k»     Positive  pass  band  gain 
can  be  designed  into  the  cir- 
cuit. 

5.  Simpler  network  configur- 
ations at  lower  cost  can  be 
achieved . 

4.  RC  filters  exhibit  attenua- 
tion of  the  signal  in  the  de» 
signed  pass  band. 
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and  zero  above  a  certain  frequency,  with  no  phase  shift  in  the 
pass  band,  is  unattainable  in  the  real  world.  Three  approxima- 
tions to  the  ideal  filter  can  be  realized  by  means  of  the 
Butterworth,  Bessel  or  Chebyshev  filters,  \lk\ 

The  Bessel  filter  exhibits  maximally  flat  time  delay 
(linear  phase)  and  therefore  sometimes. is  used  as  a  time  delay 
network.  Its  amplitude  response  in  the  pass  band  is  monotonically 
decreasing  rather  than  flat.  Its  rate  of  fall  beyond  cutoff  is 
less  than  the  Butterworth  or  Chebyshev  filters. 

The  Chebyshev  class  of  filters  have  an  equal  magnitude 
ripple  in  the  pass  band  and  maximum  rate  of  fall  beyond  3  db 
outoff .  The  response  of  the  filter  at  the  cutoff  frequency  is 
always  that  of  a  minimum  of  the  ripple.  The  allowable  degree 
of  ripple  in  the  pass  band  can  be  accounted  for  in  the  filter 
design. 

The  Butterworth  filter  is  obtained  by  locating  the  poles 

of  the  network  in  accordance  with  the  zeros  of  the  Butterworth 

Polynomial.  The  normalized  transfer  function  is  of  the  form 

/Z12  (jw)/2  =     1 

1  +  w2n 

where  n  is  the  number  of  poles  in  the  network  and  w  is  the  ratio 
of  frequency  of  interest  to  cutoff  frequency.  The  filter  has 
a  maximally  flat  amplitude  response  in  the  pass  band  and  the 
slope  of  rolloff  outside  the  pass  band  increases  directly  with 
the  number  of  poles  in  the  transfer  function.  The  response  falls 
off  at  approximately  a  constant  6n  db/ octave.  The  phase  char- 
acteristics of  the  Butterworth  filter  are  not  very  linear. 
The  time  delay  varies  as  a  function  of  frequency. 


9.  LOW-PASS  FILTER  DESIGN 

The  Fl  low-pass  filter  and  the  F2  low-pass  filter  in  AESTR 
are  identical  circuits.  Each  filter  is  a  four  pole  Butterworth 
response  circuit  with  discreite  cutoff  frequencies  of  10,  15, 
30  and  60  Hz.  The  Rauch  type  filter  network  is  selected  since 
the  circuit  values  are  rapidly  calculated  for  multiple  filter 
sections  by  using  the  normalized  tables  contained  in  Foster's 
paper.  \lh\   Also  this  network  can  be  modified  to  provide  a 
continuous  variable  cutoff  frequency  or  have  positive  gain  by 
modifying  the  resistive  elements  of  the  circuit.  [28]  In  AESTR, 
the  filters  have  unity  gain  and  the  cutoff  frequencies  are 
established  by  switching  various  capacitor  values  into  the 
network  while  maintaining  all  resistor  values  at  a  constant  value 
of  10K.  The  author  decided  to  vary  the  capacitors  rather  than 
the  resistors  to  control  cutoff  frequencies  because  of  hard- 
ware considerations.  As  more  data  and  experience  is  gained  in 
the  operation  of  AESTR,  it  may  be  desirable  to  design  positive 
gain  and  continuous  variable  cutoff  frequency  into  the  filters 
based  on  recommendations  of  the  speech  therapists.  Each  filter 
is  mounted  on  a  separate  circuit  board  and  modifications  can 
be  accomplished  without  changing  the  internal  chasis  wiring. 

The  Rauch  filter  basic  building  block  is  a  single  section 
which  has  two  poles  in  the  complex  frequency  plane.  Its  schematic 
and  transfer  function  are  shown  in  figure  8.  Two  cascaded 
sections  are  required  to  obtain  a  roll-off  of  2k  db  per  octave 
for  frequencies  above  the  cutoff  frequency.  A  25  uf  coupling 
capacitor  is  inserted  between  sections  to  block  D.C.  components 
while  a  10K  shunt  resistor  is  inserted  at  the  input  of  each 
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Figure  8 
Two  Pole  Rauch  Low-Pass  Filter  and  Transfer  Function 
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sectior  to  provide  a  D.C.  return  path  to  the  base  of  the  in- 
verting input  transistor  enclosed  in  the  Fairchild  uA  709 
operational  amplifier.  This  resistor  also  develops  the  required 
input  voltage  necessary  for  proper  filter  response.  All  filter 
network  resistors  are  fixed  at  10  K  ohms  to  provide  an  adequate 
filter  impedance  match  to  the  mixer  output  and  to  determine 
practical  capacitor  values  which  can  be  obtained  for  fabrication 
of  the  network. 

From  the  table  of  normalized  capacitor  values  for  a  Butter- 
worth  filter  with  four  poles,  [l4]  ,  it  is  a  simple  matter  to 
calculate  capacitor  values  for  various  low-pass  filter  cutoff 
frequencies.  The  calculated  and  actual  values  used  in  the 
AESTR  apparatus  are  listed  in  table  60  Although  the  actual 
capacitor  component  values  deviated  from  the  calculated  values, 
the  filter  response  is  quite  satisfactory.  Figure  9  is  a  plot 
of  the  frequency  response  curves  of  the  low-pass  filters  in 
AESTR. 

The  various  capacitors  are  mounted  on  a  five  pole  two  gang 
switch  which  is  operated  from  the  front  panel  of  AESTR.  The 
ten  inch  cable  wires  between  capacitors  and  circuit  boards  do 
not  contribute  any  noticable  adverse  effect  on  the  filter  re- 
sponse. 

A  zero  output  response  is  observed  for  zero  beat  frequency 
output  of  the  mixer  stage  due  to  the  coupling  capacitors  of 
the  filter.  This  effect  does  not  affect  the  purpose  for  which 
AESTR  is  to  be  used  since  it  $papracfti&a~lly  impossible  for  a 
person  to  hold  his  vowel  formants  exactly  on  frequency  with 
the  local  oscillators.  The  continuous  deviations  of  the  formants 
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TABLE  6 

CAPACITOR  VALUES  OF  FOUR  POLE  BUTTERWORTH  RESPONSE  RAUCH 
TYPE  LOW-PASS  FILTER  WITH  ALL  RESISTOR  VALUES  SET  AT  10K  OHMS 


Capa 

citor  Value    (uf ) 

Cutoff  Frequency   (Hs) 

,1 

computed 

10 

15      30 

60 

6.25 

4.17     2.08 

1.04 

actual 

8.2 

4.0      2.0 

1.0 

C2 

computed 

6m 

0.27     0.14 

0.068 

actual 

0.4 

0.22     0.13 

0.068 

~3 

computed 

2.58 

1.73     0.86 

0.43 

nctual 

4.0 

1.5      0.8 

0.4 

4 

computed 

0.98 

O.65     0.33 

0.164 

actual 

1.0 

0.8      0.4 

0.168 
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are  sufficient  t©  cause  beat  frequencies  which  will  be  present  in  the 
pass  bands  of  the  filters. 

The  filter  output  is  passed  into  an  amplifier  using  a  1  K  ohm  in- 
put resistor  and  a  1  Megohm  potentiometer  in  the  feedback  circuit 
across  a  uA  709  operational  amplifier.  The  potentiometer  control  is 
designated  as  "filter  sensitivity"  on  the  front  panel  of  AESTR. 

As  stated  previously,  two  identical  low»pass  filters  are  contained 
in  the  AESTR  system.  One  filter  responds  to  the  first  formant  beat 
frequency  and  the  other  responds  to  the  second  formant  beat  frequency 
generated  in  their  respective  mixer  circuits.  Figure  10a  is  a 
schematic  of  the  complete  filter  network  while  Figure  10b  is  a  schematic 
of  the  beat  frequency  amplifier  which  drive  a  0=10  volt  rectifying 
voltmeter.  Several  typed  of  meters  were  considered  for  use  as 
visual  indicators  of  the  beat  frequency.  The  meters  used  in  AESTR 
were  selected  simply  because  they  were  available  in  the  stockroom 
and  adequately  served  AESTR* s  purpose. 

In  Figures  10a  and  10b,  the  uA  709  operational  amplifiers 
are  frequency  compensated  in  the  same  manner  shown  in  the 
preamplifier  schematic  of  Figure  6.  The  components  have  been 
omitted  from  the  filter  and  amplifier  circuits  for  the  sake 
of  clarity.  Also  the  schematics  identify  terminals  associated 
with  circuit  board  B.  Circuit  board  C  is  identical  to  B  with 
respect  to  all  terminal  connections  and  component  values. 


1*6 


Figure  10a.  Four  Pole  Rauch  Low-Pass  Filter  Schematic 
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Figure  10b 
Low-pass  Filter  Gtin  Schematic  and  Beat  Frequency  Indicator 
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10.   HIGH-PASS  FILTER  DESIGN 

In  order  to  prevent  the  fundamental  frequency  of  the  vocal 
bands  from  passing  directly  through  the  mixer  and  low-pass 
filter  circuits,  a  high-pass  filter  network  is  inserted  between 
the  preamplifier  and  mixers.  Its  configuration  is  realized 
by  the  Salen  and  Key  method.  35  A  high-pass  filter  has  a 
normalized  frequency  transfer  function  of 


S2 


Eout  =  

Ein     s2  +  ds  +  1 


w 


where  d  is  the  damping  factor.  This  type  of  response  is  obtained 
from  the  basic  high-pass  filter  network  of  Figure  11. 
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Figure  11, 
Basic  High-Pass  Filter  Network 

Such  a  filter  will  give  a  12  db  per  octave  roll-off  for  fre- 
quencies below  the  cutoff  frequency „  R^,  R2,  C^  and  C2  and  the 
gain  of  the  emitter  follower  act  together  to  determine  the  cutoff 
frequency  and  the  shape  of  the  response  curve  during  the  transi- 
tion from  the  stop  band  to  the  pass  band.  In  the  actual  circuit, 
R2  is  equal  to  the  resistance  of  three  parallel  resistors. These 
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are  the  two  bias  resistors  and  the  input  resistance  of  the 
2N226  transistor. 

The  AESTR  high-pass  filter  schematic  is  shown  in  Figure  12. 
An  emitter  follower  drives  the  high=pass  filter  stage  which 
consists  of  two  cascaded  sections  to  yield  an  expected  atten- 
uation of  2k  db  per  octave.  [l6j  The  individual  sections  do 
yield  a  Butterworth  response  of  12  db  per  octave  roll-off, 
but,  when  cascaded  together,  a  total  roll-off  of  only  20  db 
per  octave  is  realized  with  an  additional  +3  db  hump  at  the 
corner  frequency.  The  actual  response  shown  in  Figure  13  is 
considered  satisfactory  for  the  pitch  elimination  function 
in  AESTR* s  system. 

Note  that  the  pitch  eliminator  has  four  discrete  cutoff 
frequencies  of  75*   190,  450  and  1050  Hz.  The  desired  cutoff 
is  obtained  by  switching  in  various  capacitors  mounted  on  a 
five  pole,  two  gang  switch  attached  to  AESTR' s  front  panel. 
The  fifth  position  permits  the  high-pass  filter  to  be  bypassed 
so  that  AESTR  can  be  used  to  discriminate  between  voiced  and 
unvoiced  consonants 0  This  feature  was  incorporated  into  the 
apparatus  after  Dr.  Gray  operated  a  breadboard  version  of  the 
system  and  suggested  that  a  "pitch"  or  "no  pitch"  capability 
be  incorporated  into  AESTR. 
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11.  DECISION  AND  RESPONSE  CIRCUIT  DESIGN 

The  beat  frequency  output  of  the  first  and  second  formant 
filters  is  applied  to  terminals  3  and  20  of  circuit  board  D 
whose  schematic  is  whown  in  Figure  14.  These  waveforms  are 
Half -wave  rectified  and  smoothed  by  a  low-pass  passive  RC 
filter.  The  resultant  D.C.  voltages  are  impressed  on  the  two 
input  gates  of  an  AND  circuit.  When  both  Fl  and  F2  beat  frequency 
rectified  voltages  are  simultaneously  present  and  also  of  suffic- 
ient  magnitude  to  cause  +7*5  volts  D.C.  to  appear  on  each 
diode  of  the  AND  gate,  the  diodes  become  reverse  biased  thereby 
directing  a  400  microampere  current  into  the  base  of  the  2N2924 
transistor.  This  action  drives  the  transistor  into  saturation, 
permitting  a  collector  current  of  30  milliamperes  to  flow 
through  the  relay  coil,  which  acts  as  the  load  for  the  circuit, 
and  closes  the  relay  contacts ,  A  zener  diode  is  inserted  at 
the  base  terminal  of  the  transistor  to  prevent  the  transistor 
from  being  switched  on  when  only  one  diode  of  the  AND  gate  is 
reverse  biased. 

The  relay  is  a  stockroom  surplus  item  which  operates  on 
14  volts  and  25  milliamperes .  It  has  two  sets  of  contacts. 
One  set  activates  the  green  panel  "correct ,!  light  and  the  other 
set  connects  a  115  volt  supply  to  the  appliance  socket  mounted 
on  the  rear  chasis  of  AESTR.  The  Monterey  Institute  for  Speech 
and  Hearing  does  have  a  115  volt  relay  operated  device  which 
dispenses  M&M  candy  disks  to  children  when  they  perform  desired 
tasks.  AESTR  is  able  to  operate  this  dispenser  or  any  other 
115  volt  device  in  response  to  the  desired  articulation  of 
the  child. 
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12.  FABRICATION 

Economy  and  availability  of  supplies  dictated  construction 
of  AESTR.  All  components  are  housed  in  an  aluminum  case  16M 
wide,  12"  deep  and  10"  high.  The  control  panel  is  inclined 
20°  from  the  vertical  so  that  the  values  of  the  control  settings 
can  be  read  with  greater  ease.  The  case  was  handmade  in  the 
student  metal  shop.  In  addition  the  control  panel  was  rubbed 
with  emery  paper  until  the  metal  acquired  a  satin  finish. 

The  chasis  for  circuit  components  has  four  22  terminal 
sockets  which  accept  the  standard  H%"   by  6"  circuit  boards. 
Also  mounted  on  the  chasis  is  an  11  pin  socket  for  the  power 
supply  package,  mounting  holes  for  the  relay  plus  an  octal 
socket  for  power  distribution  cables  and  a  2?  pin  socket  for 
signal  distribution  cables  which  originate  from  the  components 
mounted  on  the  rear  of  the  control  panel.  Fusing  is  provided 
for  circuit  protection. 

The  circuit  boards  are  identified  by  letters  which  are: 
Board  A  Preamplifier,  High-Pass  filter,  Mixers 
Board  B  Fl  Low-Pass  filter,  amplifier 
Board  C  F2  Low-Pass  filter,  amplifier 
Board  D  Rectifier,  AND  circuit,  transistor  switch 
The  functional  segregation  of  the  circuit  boards  permits  future 
changes  to  the  circuitry  by  simply  replacing  an  entire  board. 
It  should  not  be  necessary  to  change  the  internal  wiring  of 
the  chasis  for  such  modifications. 

Original  circuit  boards  used  for  mounting  of  components 
were  the  etched  contact  plugboards  Vector  #838PWE.  They  are 
considered  to  be  restrictive  in  flexibility.  The  Vector 
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#838  £©»i«petehed  boards  proved  to  be  more  versatile.  Components 
are  mounted  easily  and  securely  with  the  aid  of  metal  washers 
riveted  on  the  holes  through  which  the  lead  wires  pass  through 
to  the  other  side  of  the  bg»aod0  Additional  holes  must  be 
drilled  into  the  board  to  accomodate  the  integrated  circuit 
octal  socket.  Learning  how  to  properly  mount  components  so  as 
to  conserve  space,  minimize  leads  and  avoid  gpatmd  loops  is 
considered  by  the  author  to  be  a  very  useful  and  important  aspect 
of  this  thesis. 

The  electronic  circuits  of  AESTR  require  30  milliamperes 
on  both  the  plus  and  minus  15  volt  supply  terminals.  When  the 
relay  and  "correct"  light  are  activated,  the  current  drain 
increases  to  95  milliamperes  on  both  supply  terminals.  The 
power  is  supplied  by  a  Power  Mate  Power  Supply,  Model  DRA16- 
.2/16-. 2.  Its  regulated  output  can  be  wet  between  15  frnd  1? 
volts  and  is  rated  to  provide  200  milliamperes  on  the  plus 
and  minus  terminals.  The  voltage  regulation  is  excellent  even 
during  sudden  current  level  changes  when  the  light  and  relay 
activate.  Figure  15  shows  the  power  distribution  in  AESTR. 

r' 

As  stated  previously,  the  filter  capacitors  are  mounted 
on  five  pole,  two  gang  switches.  These  components  are  located 
longitudinally  around  the  periphery  of  the  ^witches  so  as  to 
economize  on  space  and  also  obtain  structural  support. 

Trouble  shooting  the  system  after  AESTR  was  completely 
wired  consumed  many  hours.  A  component  value  error  and  cable 
error  required  correcting  before  successful  operation  of  the 
assembled  machine  could  be  achieved. 

Numerous  minor  problems  were  encountered  in  the  fabrication 
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AESTR  Power  Distribution 
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of  AESTR.  These  difficulties  did  serve  to  prove  the  fact 
that  transition  from  theory  to  a  practical  working  apparatus 
is  not  a  trivial  matter. 
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13.  PRELIMINARY  TEST  RESULTS 

AESTR  was  initially  tested  during  the  final  phase  of  its 
design  stage  by  Dr.  Gray  at  the  school  electronics  laboratory. 
At  that  time,  the  Fl  and  F2  low-pass  filters  had  fixed  cutoff 
frequencies  of  50  and  100  Hz  respectively.  His  evaluation 
of  the  machine  indicated  that  the  pass  band  of  the  filters 
had  to  be  reduced  in  order  to  have  the  machine  properly  dis- 
criminate between  the  closely  related  voiced  sounds  such  as 
ER  and  E.  Therefore  the  filters  were  redesigned  to  have  a 
series  of  discrete  cutoff  frequencies  of  10,  15,  30  and  60  Hz. 

During  this  initial  evaluation,  it  was  also  learned  that 
air  streams  impinging  on  the  microphone  cause  a  transient  re- 
sponse in  AESTR  of  sufficient  magnitude  to  activate  the  relay 
circuit.  To  avoid  such  a  type  of  false  response,  the  speaker 
should  hold  the  microphone  in  a  vertical  position  approximately 
four  inches  away  from  and  slightly  below  his  lips.  In  the  case 
of  a  child,  a  microphone  headset  type  configuration  similar 
to  the  kind  commonly  worn  by  telephone  operators  would  keep  the 
microphone  properly  positioned  relative  to  the  mouth  of  the 
speaker. 

After  its  fabrication,  AESTR  was  tested  by  the  author. 
The  machine  control  settings  obtained  for  an  adult  male  voice 
and  female  voice  articulation  of  the  vowel  sounds  are  listed 
in  Table  7.  These  settings  represent  the  best  values  which 
could  be  obtained  for  the  smallest  spectral  window  in  the 
F1-F2  plane.  In  all  cases,  the  first  formant  of  the  vowel  was 
readily  located  with  minimum  sweeping  of  the  Fl  local  oscil- 
lator. The  second  formant  was  more  difficult  to  locate  for 
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vowel  sounds  IY,  I,  and  ER.  The  F2  local  oscillator  must  be 
swept  across  its  frequency  range  three  or  four  times  before 
the  operator  is  certain  that  the  F2  frequency  has  been  located. 
This  is  to  be  expected  since  the  amplitude  of  the  second  for- 
mants  is  lower  than  the  first  formant  for  all  vowel  sounds. 

Pitch  measurements  were  made  according  to  the  procedures 
stated  in  section  5.  Pitch  frequencies  are  rapidly  determined 
and  do  show  a  variation  with  the  vowel  sounds  as  indicated 
in  Table  2. 

TABLE  8 
AESTR  PITCH  MEASUREMENTS  OF  AN  ADULT  MALE  VOICE  FOR  VOWEL  SOUNDS 


Vowel 


IY 


Pitch  (Hz)  ;  110 


I 
117 


E 
98 


AE 


98  !  96 


OW 
90 


U 
112 


00 
108 


UH 


ER 


104  i  100 


AESTR  is  now  on  loan  to  the  Monterey  Institute  for  Speech 
and  Hearing  for  field  testing.  Their  preliminary  operation 
of  the  apparatus  in  conjunction  with  an  McSM  candy  dispenser 
revealed  a  new  problem.  Candy  disks  were  being  dispensed  at 
a  very  rapid  rate  since  the  relay  opened  and  closed  every  time 
the  voice  quived  in  and  out  of  the  desired  sound  spectral 
window.  Therefore,  to  make  AESTR  provide  only  one  reward  item 
with  each  sustained  sound,  the  AND  circuit  was  modified  to 
have  a  250  millisecond  delay  before  closing  the  relay  contacts, 
and  once  closed,  the  relay  would  not  open  for  two  seconds. 
This  modification  consisted  of  choosing  the  correct  shunt  ca- 
apacitor  values  in  the  half-wave  rectifier  portion  of  the  decision 
and  response  circuit.  A  nominal  value  of  100  microfarads 
working  with  the  resistive  elements  of  the  circuit  develops 
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build-up  and  decay  time  constants  to  meet  the  operating  speci- 
fication for  the  relay. 

Dr.  Gray  and  his  associates  tested  AESTR  for  its  ability 
to  discriminate  the  individual  vowel  sounds.  The  preliminary 
results  indicate  that  the  machine,  for  certain  vowels,  will 
give  a  positive  response  to  not  only  the  targeted  vowel  but 
also  to  certain  other  vowel  sounds.  For  example,  AESTR  can 
be  set  to  respond  to  OW  and  it  will  perform  properly  such  that 
the  speaker  is  unable  to  cause  a  positive  machine  response 
with  any  vowel  sound  other  than  OW.  However,  if  AESTR  is 
targeted  for  the  central  vowel  sound  ER,  the  machine  will 
respond  to  ER  plus  the  phonemes  A,  OW,  U,  00  and  UH.  The 
apparent  cause  for  this  undesirable  multi-sound  response  is 
due  to  the  fact  that  ER  has  a  relatively  low  intensity  level 
for  its  first  and  second  formants  when  compared  to  back  vowels, 
especially  OW.  Unfortunately,  the  therapist  has  a  greater 
need  to  teach  the  ER  rather  than  OW  to  speech  handicapped 
children.  To  improve  AESTR* s  ability  to  respond  strictly  to 
the  ER  sound,  ^.Dr.  Gray  and  the  author  varied  the  "pitch" 
control  settings.  The  attempt  indicated  that  some  improvement 
could  be  made  if  the  "pitch"  control  is  set  to  position  "D". 
Now  the  machine  will  respond  only  to  ER  and  OW.  The  OW  vowel 
continues  to  mask  all  other  vowels  since  it  does  contain  the 
greatest  amount  of  energy  throughout  the  audio*  frequency  spec- 
trum. 

A  different  approach  was  tried  to  overcome  the  ER  ambi- 
guity response  of  AESTR.  Both  the  Fl  and  F2  local  oscillators 
were  set  to  the  second  formant  frequency  of  1480  Hz  while 
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the  '•pitch"  control  remained  in  position  lfD".  The  volume  con- 
trol was  set  to  a  value  of  3  and  the  sensitivity  controls  were 
set  to  a  value  of  ba     In  this  state,  the  machine  would  respond 
only  to  the  ER  sound  for  a  majority  of  trials.  This  can  be 
explained  by  noting  that  CW  has  both  its  Fl  and  F2  frequencies 
below  1  KHz  which  are  attenuated  by  the  high=pass  filter  and 
the  harmonic  components  of  OW  near  1^80  Hz  are  insufficient 
to  cause  a  positive  response  of  the  machine.  Now,  when  a  speaker 
makes  the  ER  sound,  its  second  formant  (near  1^0  Hz)  is  not 
attenuated  by  the  high~pass  filter  and  will  provide  a  strong 
beat  frequency  out  of  both  Fl  and  F2  filters  thus  causing 
AESTR  to  give  a  positive  response.  This  type  ofutoAnfcine  oper- 
ating procedure  will  be  investigated  further  and  extended  to 
take  advantage  of  the  third  formant  information  associated 
with  each  vowel. 

A  speaker  is  able  to  cause  AESTR  to  give  a  positive  response 
when  he  greatly  increases  the  intensity  of  his  voice.   The 
author  recommends  that  some  type  of  distortionless  speech 
compressor  be  inserted  between  the  microphone  and  preamplifier. 
Commercial  devices  are  readily  availabe  to  control  the  micro- 
phone peak  loudness  yield. 

A  human  limitation  prevents  AESTR  from  being  operated 
for  more  than  15  minutes  by  one  speaker.  After  a  person  has 
been  producing  voiced  sounds  for  this  period  of  time,  he  will 
start  becoming  hyperventilated  and  experience  dizziness.  The 
effect  is  analogous  to  a  person  blowing  up  a  large  balloon. 
Dr.  Gray  is  giving  consideration  to  this  factor  and  will  de- 
velop a  clinical  testing  procedure  to  avoid  hyperventilation 

of  the  speaker o 
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]4.   CONCLUSIONS 

The  prototype  apparatus  does  perform  electrically  in  the 
manner  it  vas  designed  to  operate  but  this  does  not  imply  that 
AESTR  is  performing  in  a  totally  satisfactory  manner  from  the 
viewpoint  of  the  speech  therapist.  AESTR  is  considered  to  be 
approximately  50$  successful  in  meeting  the  needs  of  the 
•fcherapfcst.  With  more  operating  data  obtained  from  the  machine 
in  future  months,  it  is  hoped  that  additional  design  criteria 
can  be  established  to  improve  AESTR' s  performance. 

In  addition  to  aiding  speech  handicapped  children,  AESTR 
has  potential  applications  to  aid  persons  trying  to  learn 
foreign  vowel  sounds.  Also  this  apparatus  can  be  used  in  an 
auxiliary  manner  to  measure  tones  of  musical  instruments  such 
as  pianos  or  organs  with  a  high  degree  of  accuracy. 

Speech  processing  and  especially  specific  analysis  of 
spectral  components  of  voiced  sounds  is  a  challenging  task 
from  an  engineering  viewpoints  This  fact  became  very  apparent 
from  what  appeared  to  be  a  very  straight  forward  thesis  subject. 
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APPENDIX  1 

Selected  Glossary  of  Speech  Terms 

ARTICULATE.  To  produce  a  speech  sound  by  the  organs  of  speech. 

ARTICULATION.  The  set  of  human  bodily  positions  and  movements 
aiming  at  the  production  of  speech  sounds. 

BACK.  A  vowel  articulated  by  raising  the  back  part  of  the 
tongue  towards  the  velum,  e.g.  sort. 

CENTRAL.  A  vowel  articulated  by  raising  the  central  part  of 
the  tongue  towards  the  juncture  of  the  palate  and  the 
velum,  e.g.  first. 

CONSONANT.  A  speech  sound  articulated  by  a  complete  closure 
of  the  air  passage  or  by  a  narrowing  of  it  beyond  the 
vowel  limit,  e.g.  go,  or  see. 

DIPHTHONG.  A  vowel  articulated  by  a  deliberate  movement  of 
the  speech  organs  from  one  position  into  the  other. 

FRICATIVE.  A  consonant  articulated  by  a  narrowing  of  the  air- 
passage  resulting  in  the  audible  friction,  e.g.  shame. 

FRONT.  A  vowel  articulated  by  raising  the  front  part  of  the 
tongue  towards  the  palate,  e.g.  get. 

FULLY  VOICED.  A  speech  sound  articulated  by  the  vocal  cords 
vibrating  during  the  whole  of  its  articulation,  e.g. 
living  or  put. 

ORGANS  OF  SPEECH.  Those  parts  of  the  human  body  which  are 

active  in  the  production  of  speech  sounds,  i.e.  the  lungs 
the  trachea  (windpipe),  the  vocal  cords,  the  glottis, 
the  pharynx,  the  nose,  the  lips,  the  teeth,  the  alveoli 
(teeth  ridge 0,  the  palate  (hard  palate),  the  velum  (soft 
palate),  the  uvula,  the  tongue.  The  tongue  is  arbitrar- 
ily divided  into  four  parts :     the  tip,  the  blade,  the 
center  and  the  back. 

PHONEME.  A  class  of  distinctive  speech  sounds,  the  members 
of  which  are  (1)  in  complementary  distribution  with 
each  other,  and  (2)  in  opposition  or  contrast  to  any 
other  class  of  distinctive  speech  sounds.  Thus,  /d/  in 
read  and  /d/  in  middle  are  members  of  the  same  phoneme, 
whereas  /d/  in  date  and  /l/  in  late  are  members  of  two 
different  phonemes. 

PHONEMICS.  The  scientific  study  of  distinctive  speech  sounds. 

PHONETICS.  The  scientific  study  of  speech  sounds. 
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PLOSIVE c  A  consonant  articulated  by  a  complete  closure  of  the  air 
passage,  combined  with  air~eompression  behind  the  closure,  and 
followed  by  an  explosion  in  the  release  stage,  e.g.  kind. 

SPEECH,  A  sequence  of  sounds  articulated  for  the  purpose  of  human 
communication o 

SYLLABLE.  A  structural  unit  capable  of  being  connected  as  a  whole 
with  one  particular  degree  of  accent,  e.g.  become. 

VELUM.  The  soft  palate  of  the  oral  cavity. 

VOICED.  A  speech  sound,  consonant  or  vowel,  articulated  with  the 
vocal  cords  vibrating  during  the  whole  of  its  articulation,  or 
part  of  it,  e.gc  weather,  park,  one. 

VOICELESS.  A  speech  sound,  especially  a  consonant,  articulated  with 
no  voicing,  e.go  lucky. 

VOWEL.  A  speech  sound  articulated  with  no  closure  of  the  air-passage 
and  no  narrowing  of  it  beyond  the  vowel  limit,  e.g.  bad  or  most. 

WORD.  A  structural  unit  separated  in  writing  by  spaces,  e.g.  bed 
(one  word),  room  (one  word),  bedroom  (one  word),  textbook 
(one  word),  a  good  subject  (three  words). 
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APPENDIX  II 
PHOTOGRAPHS  OF  PROTOTYPE  EQUIPMENT 
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PREAMPLIFIER 


R  LOW  PASS  FILTER 
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LOW  PASS  FILTER 


Figure  19.     AFSTR  Circuit  Board* 
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13.  ABSTRACT 


A  concept  for  an  apparatus  which  visually  displays  and  responds  to  the 
first  and  second  formant  of  vowel  sounds  is  developed.  The  machine  is  intended 
for  use  by  deaf  and  speech  handicapped  children  in  learning  to  produce  voiced 
sounds.  System  design  and  principles  applied  to  realize  a  physical  prototype 
of  this  concept  are  presented.  The  complete  electronic  and  mechanical  design 
plus  fabrication  of  the  automatic  electronic  speech  training  responder  is  described 
in  detail.  Schematic  diagrams  of  all  electronic  circuitry  employed  and  photographs 
of  the  prototype  equipment  are  included.  The  apparatus  is  on  loan  to  the 
Monterey  Institute  for  Speech  and  Hearing,  Monterey,  California  for  clinical 
testing  and  evaluation. 
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